-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add modEventDivertBlobs event to send blobs to abyss #2238
Conversation
} | ||
|
||
private async getBlob(opts: { did: string; cid: string }) { | ||
// TODO: Is this safe to do or should we be reaching out to the pds instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some help here figuring out the right way to get the blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devinivy might be able to fill in with typescript-specific tricks... maybe there is a helper for this already? You could look in the appview code, in the auto-labeler section.
The general flow is to resolve the DID doc (probably from a local cache, or even just passing the full doc through to this method from elsewhere); get the account's PDS from there; make an un-authenticated request to the getBlobs
endpoint, with appropriate HTTP retries and timeout. For correctness, then verify the hash of the content bytes against the blob CID.
status: number, | ||
data: Record<string, boolean>, | ||
) => | ||
nock(BLOB_DIVERT_SERVICE_HOST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use this pattern of intercepting http reqs anywhere in the codebase so wondering if it's ok to introduce this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often avoid these kinds of "live" mocking libraries because they tend to create some weird interactions. I think we can give it a whirl and if it shows any signs of weirdness, not have any qualms about ripping it out.
It seems like we could pretty easily mock it in the application by placing a method on BlobDiverter
responsible for getting this response, and instead just mock-out that method. Node now comes with pretty awesome built-in mocking which makes this quite easy to do: https://nodejs.org/api/test.html#class-mocktracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool ended up using the jest.fn mocks.
Some high-level comments:
|
I'm also fine with
Yep, that's what happens right now, we expect the event to send us
Yeah I'd go for multiple event approach but just let the backend emit a consecutive takedown event when divert is received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high level, looking great!
subjectDid: string | ||
subjectBlobCid: string | ||
subjectUri: string | ||
divertedAt: Date | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can name this confirmedAt
to keep it consistent with repo_push_event
, record_push_event
, blob_push_event
. The idea is that once the event has been accepted by the other service it's "confirmed."
return false | ||
} | ||
|
||
const url = `${this.serviceConfig.url}?did=${subjectDid}&uri=${subjectUri}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will want to ensure these parts get %-encoded. The most reliable way to do it is probably to build a URL
.:
const url = new URL(this.serviceConfig.url)
url.searchParams.set('did', subjectDid)
url.searchParams.set('uri', subjectUri)
method: 'POST', | ||
data: imageStream, | ||
headers: { | ||
Authorization: this.serviceConfig.authToken, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we wouldn't need to configure the whole authorization header including the auth scheme. Should we be using service auth here, a bearer token, or something else? (CC @bnewbold)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abyss
wants HTTP Basic auth, with username admin
, and the password as configured "token".
I don't think we have service-auth implemented generically in golang yet, so not super quick to add that (there are other golang clients for this service as well, so need both client+server impls). If we can keep it HTTP Basic that is best from my perspective. I don't know how the axios
HTTP library works, maybe we could even encode the password in the URL so there is a single var to configure? abyss
is generally not really set up as a regular atproto service, it is most like cardyb and other microservices (one diff is that abyss has pseudo-lexicon endpoints, though the golang side doesn't actually use the lexicon codegen or validation).
if (!this.serviceConfig.authToken || !this.serviceConfig.url) { | ||
throw new Error('Blob divert service not configured') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no service config, can we avoid creating a BlobDiverter
at all? It is a bit awkward to have to check both serviceConfig.authToken
and serviceConfig.url
in each method and opens the question about how each method should behave if they aren't present.
packages/ozone/src/config/config.ts
Outdated
export type BlobReportServiceConfig = { | ||
url?: string | ||
authToken?: string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like both url
and authToken
are required in order to use the blob report service. Could we make these both required here, and update OzoneConfig
to allow it not to be configured:
export type OzoneConfig = {
// ...
blobReportService: BlobReportServiceConfig | null
}
@@ -0,0 +1,229 @@ | |||
import { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to toss out an idea— the blob_push_event
has an eventType
. Could we make blob diversions a new event type and avoid needing to add a new table, new part of the daemon that needs the polling machinery, etc.? I think the blob_push_event
is already setup decently well for this situation, may just need to handle the eventType
. I could see most of the BlobDiverter
class sticking around to encapsulate the specifics about how the blob is pushed out, but not need to handle the polling, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I like that approach better, refactored the code to use that table instead.
if ( | ||
isModEventDivert(event) && | ||
subjectInfo.subjectUri && | ||
subjectInfo.subjectBlobCids?.length | ||
) { | ||
const blobDiverts = await Promise.all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other event types we don't take the effect of the event in logEvent()
. Instead there's a method that handles it, which gets called within com.atproto.admin.emitModerationEvent
after logging the event. For consistency maybe we should move this into that pattern rather than have this outlier.
const uploadResult = await retryHttp(() => | ||
this.uploadBlob( | ||
{ imageStream, contentType }, | ||
{ subjectDid, subjectUri }, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imageStream
is stateful so it's not going to be super retryable if it gets partially used-up. It's possible that this.getBlob()
and this.uploadBlob()
should go in the same retry block.
…roto into divert-blobs
import { subjectFromInput } from '../../mod-service/subject' | ||
import { ModerationLangService } from '../../mod-service/lang' | ||
import { retryHttp } from '../../util' | ||
import { ModeratorOutput, RoleOutput } from '../../auth-verifier' | ||
|
||
const handleModerationEvent = async ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored this into a function so that we can execute the entire flow again after divert event.
content, | ||
recipientDid: subject.did, | ||
}), | ||
if (isModEventDivert(event) && subject.isRecord()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync divert blob sequence here before logging the event.
this way, the moderator will immediately know that the diversion failed and the event won't be logged and won't be takendown.
}) | ||
|
||
// On divert events, we need to automatically take down the blobs | ||
if (isModEventDivert(input.body.event)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emit takedown event following divert event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One event expanding into two events on the backend is a little funky. If we find ourselves needing to do this again it might be worth revisiting, seeing if there's a nice way to think about this.
) | ||
.returning(['id', 'subjectDid', 'subjectBlobCid', 'eventType']) | ||
.execute() | ||
const blobEvts = await this.eventPusher.logBlobPushEvent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to the current implementation but felt nice to keep this refactoring.
subjectDid: subject.did, | ||
subjectUri: subject.uri || null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model seems to expect a subjectUri but it wasn't being passed here.
This PR adds a new mod event that sends all linked blobs to a record to an external service for performing further analysis on the blobs.
The new event expects an array of
subjectBlobCids
that will be forwarded to the external service.A couple of open questions
subjectBlobCids
or default to ALL blobs in the record if none are specified?